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Stationarity  Detection  in  the  Initial  Transient  Problem 


Soren  Asmussen*  Peter  W.  Glynn^  Hermann  Thorisson* 


Abstract 

Let  X  =  be  a  stochastic  process  with  a  stationary  version  X* .  It  is  investigated 

when  it  is  possible  to  generate  by  simulation  a  version  X  ol  X  with  lower  initial  bias  than 
X  itself,  in  the  sense  that  either  X  is  stationary  (has  the  same  distribution  as  X')  or 
the  distribution  of  X  is  close  to  the  distribution  of  X* .  Particular  attention  is  given  to 
regenerative  processes  and  Markov  processes  with  a  finite,  countable  or  general  state  space. 

The  results  are  both  positive  and  negative,  and  indicate  that  the  tail  of  the  distribution  of 
the  cycle  length  r  plays  a  critical  role.  The  negative  results  essentially  state  that  without 
some  information  on  this  tail,  no  apriori  computable  bias  reduction  is  possible;  in  particular, 
this  is  the  case  for  the  class  of  all  Markov  processes  with  a  countably  infinite  state  space. 

On  the  contrary,  the  positive  results  give  algorithms  for  simulating  X  for  various  classes  of 
processes  with  some  special  structure  on  r,  for  example  finite  state  Markov  chains,  Markov 
chains  satisfying  a  Doeblin  type  minorization,  and  regenerative  processes  with  r  having  a 
bounded  (p+  l)th  moment  or  having  a  stationary  age  distribution  that  can  be  generated  by 
simulation. 

1  Introduction 

When  performing  a  steady-state  simulation,  simulators  are  often  concerned  with  the  problem  of 
dealing  with  the  initial  transient.  The  term  ‘initial  transient’  refers  to  that  initial  segment  of  the 
simulation  that  is  contaminated  by  bias  introduced  by  starting  the  system  in  some  state  that 
is  not  typical  of  the  long-run  behaviour  of  the  system.  The  observations  gathered  during  the 
initial  transient  are  therefore  not  representative  of  the  steady-state  behavior  of  the  system  and 
are  biased.  Perhaps  the  most  popular  means  of  dealing  with  the  initial  transient  is  to  discard 
the  observations  gathered  during  this  period.  In  other  words,  the  simulator  lets  the  simulation 
‘warm  up’  before  collecting  any  observations. 

Of  course,  the  key  question  is  to  determine  how  long  the  ‘warm  up’  period  must  be  for  a 
given  simulation.  An  essentially  equivalent  formulation  of  the  problem  is  to  identify  that  time 
at  which  the  initial  transient  terminates  and  steady-state  behavior  begins.  Since  steady-state 
behavior  is  characterized  by  stationarity  of  the  stochastic  process,  we  can  therefore  view  the 
initial  transient  problem  as  involving  the  determination  of  that  time  at  which  the  simulation 
is  behaving  like  a  stationary  stochastic  process.  This  paper  is  concerned  with  the  question  of 
existence  and  construction  of  such  stationarity  detection  times  (and  suitable  generalizations). 

This  is  a  problem  that  has  challenged  the  simulation  community  for  many  years.  Among  the 
papers  that  have  addressed  this  question  are  [9],  [12],  [30],  [31],  and  [23];  see  also  [11]  and  [19]. 
It  is  probably  safe  to  say  that  no  technique  yet  proposed  satisfactorily  solves  this  problem.  It 
is  also  worth  noting  that  when  one  estimates  the  steady-state  via  multiple  replications  of  the 
process,  the  ability  to  determine  the  end  of  the  initial  transient  is  enhanced.  The  basic  idea 
is  that  by  averaging  over  multiple  replicates,  much  of  the  variability  in  the  system  is  damped 


out,  so  that  the  convergence  to  steady-state  is  easier  to  determine.  Among  the  papers  that 
take  advantage  of  this  idea  are  [18],  [27],  [28].  In  the  current  paper,  our  interest  focuses  on 
stationarity  detection  rules  that  are  based  on  a  single  run  of  the  system. 

We  will  see,  in  Section  2  of  the  paper,  precisely  why  the  initial  transient  problem  has  been 
so  challenging.  We  will  prove,  in  a  mathematically  precise  sense,  that  without  some  restrictions 
on  the  class  of  simulations  to  be  considered,  there  can  exist  no  universally  satisfactory  means 
for  detecting  stationarity  in  a  stochastic  simulation.  This  negative  result  is  probably  expected 
and  suggests  that  any  successful  stationarity  detection  rule  will  need  to  take  explicit  advantage 
of  some  additional  structure  of  the  system  being  simulated. 

In  the  rest  of  the  paper,  we  complement  the  above  negative  result  with  positive  ones,  that 
are  perhaps  more  surprising.  In  Section  3,  we  show  that  it  is  possible  to  generate  a  r.v  Z 
having  the  stationary  distribution  of  a  finite  Markov  chain  {A'„},  using  only  simulated  values 
and  randomization,  and  in  Section  4  it  is  shown  (using  some  recently  developed  ideas  of  [25]) 
that  for  certain  classes  of  regenerative  stocastic  processes,  we  can  identify  a  random  time  T 
such  that  the  system  is  in  exact  stationarity  at  this  instant.  These  constructions  are  possible 
even  for  certain  systems  in  which  the  steady-state  distribution  is  not  analytically  available  and 
must  be  simulated.  The  approach  taken  here  to  developing  stationarity  detection  rules  strongly 
suggests  that  in  the  regenerative  setting,  one  must  take  advantage  of  apriori  knowledge  of  the 
tail  behavior  of  the  regenerative  cycle  length  random  variable. 

Section  5  discusses  settings  in  which  approximate  stationarity  can  be  achieved.  In  Section  6, 
we  provide  further  discussion,  and  Section  7  concludes  the  paper  with  some  illustrative  examples 
and  applications.  Unless  otherwise  stated,  all  proofs  are  deferred  to  the  Appendix. 

2  Stationarity  detection:  definitions  and  basic  theory 

We  shall  restrict  our  formulation  and  discussion  in  this  section  to  the  Markov  chain  setting. 
However,  the  ideas  described  here  can,  in  fact,  be  easily  extended  to  the  general  discrete-event 
simulation  context.  One  need  only  observe  that  if  one  views  the  typical  discrete-event  simulation 
at  transition  epochs,  one  can  make  the  process  Markovian  by  adding  supplementary  variables 
to  the  state  description  that  include  information  on  the  time  that  remains  before  the  'clocks’ 
corresponding  to  each  possible  event  will  trigger  a  state  transition  of  the  system.  It  therefore 
follows  that  a  discrete-event  system  can  be  written  cis  a  functional  of  a  discrete-time  Markov 
chain  taking  values  in  a  complicated  state  space  S  in  which  both  physical  state  and  clock  state 
is  recorded.  For  additional  information  on  this  way  of  looking  at  discrete-event  simulations,  see 
[14]. 

Suppose  that  X  =  is  a  Markov  chain  taking  values  in  S.  We  can  (without  any 

loss  of  generality)  view  X  as  being  defined  on  the  probability  space 

n  =  (5  X  (0,1))  X  (5  X  (0,1))  X  •••• 

A  typical  element  in  th<"n  takes  the  form  u  =  {(x„, Un)}„=o,i  •  sequence  X  can  be 

defined  via  the  coordinate  projections  Ar„(u;)  =  i„,  and  we  further  let  U  be  the  sequence  of 
random  variables  defined  by  I/„(w)  =  u„. 

Let  K  be  a  transition  kernel  defined  on  5,  so  that  K{x,B)  represents  the  probability  that 
tlip  chain  X  moves  from  x  into  5  C  5  in  one  step.  For  each  initial  distribution  pi  on  5.  we  can 
then  define  a  probability  distribution  Pfi,h'  on  via  the  formula 


IP  pi,h'  [Ao  €  Aq,  . . . ,  An  €  AntUo  €  Bq,  ...  ,Un  €  Bn] 

=  /  nidxo)  K{xo,dxi)---  K{xn-i,dxn)  ■  dyo  -  dyn- 

J  Aq  ''A\  An  J  Bq  Bn 
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Hence,  under  the  distribution  X  is  a  Markov  chain  having  initial  distribution  fi  and 

transition  kernel  A'.  Also,  t/  is  a  sequence  of  i.i.d.  uniform  (0,1)  r.v.’s  that  is  itself  independent 
of  X.  We  will  need  the  uniform  r.v.’s  in  order  to  define  randomized  algorithms  for  detecting 
stationarity.  Much  of  our  subsequent  discussion  will  involve  such  randomized  detection  rules. 

We  let  M  denote  the  subset  of  transition  kernels  on  5  such  that  for  each  A'  €  M,  there 
exists  a  unique  stationary  distribution  tta'. 

We  say  that  T  is  a  mndom  time  if  T  is  a  non-negative  integer-valued  r.v.  defined  on  Cl. 
For  n  =  0,1,...,  define  the  shift  6„  by  d„X  =  (X„,X„+],. ..).  Roughly  speaking,  our  goal  is 
to  construct  a  random  time  T  such  that  the  post-T  process  BtX  is  in  steady  state  (is  strictly 
stationary). 

Definition  2.1  Let  K  £  M.  The  random  time  Tis  said  to  be  a  stationarity  detection  time  for 
K  if  for  each  initial  distribution  fx 

^(iM^tX  €  •)  =  IP7r;,-.A  (X  €  ■)•  (2.1) 

One  way  to  construct  stationarity  detection  times  is  by  means  of  randomized  stopping 
times  [recall  that  a  random  time  T  is  c  randomized  stopping  time  if  for  each  n  there  exists  a 
deterministic  0-1  valued  function  /„  such  that  I{T  —  n)  =  /n(A'o,  Uo,-  -  ■ ,  Xn,Uny,  sometimes 
also  the  term  non- anticipating  is  used  for  a  randomized  stopping  time].  Such  rcindomized 
stopping  times  have  the  following  nice  property: 

Proposition  2,1  Let  K  €  Ad.  IfT  is  a  randomized  stopping  time  such  that  X-p  has  distribution 
TTA"  for  each  initial  distribution  pi,  then  T  is  a  stationarity  detection  time  for  K . 

The  proof  of  this  follows  immediately  from  the  strong  Markov  property  of  X. 

Ideally,  one  would  like  stationarity  detection  times  to  detect  stationarity  immediately  if  the 
initial  distribution  is  tta'.  Our  first  result  shows  that  no  such  stationarity  detection  time  typically 
exists. 

Proposition  2.2  Let  K  ^  M  and  assume  that  n  is  not  concentrated  on  any  single  point  x  £  S . 
Then  there  exists  no  stationarity  detection  time  T  for  K  such  that 


=  0)  =  1.  (2.2) 

Thus,  the  requirement  (2.2)  demands  too  much  from  the  random  time  T.  If  we  drop  (2.2), 
it  turns  out  that  stationarity  detection  times  can  often  be  constructed: 

Example  2.1  Suppose  that  5  is  finite  or  countably  infinite  and  that  A'  is  irreducible.  If  X  is 
positive  recurrent  under  A',  there  exists  a  unique  stationary  distribution  tta,  and  (by  applying 
inversion),  we  can  find  a  deterministic  function  g  such  that  g{Uo)  has  distribution  tta'.  Then 

T  =  inf{n  =  0,l...:X„  =  giUo)} 

is  a  randomized  stopping  time  such  that  Xp  has  distribution  tta',  and  we  may  apply  Proposition 
2.1.  '  □ 

However,  this  construction  obviously  ‘cheats’  by  constructing  the  function  g  (and  hence  the 
stopping  rule  T)  from  explicit  knowledge  of  tta'.  This  suggests  that  a  more  appropriate  formulation 
for  a  stationarity  detection  time  ought  to  somehow  forbid  the  simulator  from  using  explicit 
knowledge  of  the  stationary  distribution  to  construct  T. 

We  can  accomplish  this  by  requiring  that  T  work  uniformly  well  over  a  suitably  large  class 
A”  of  transition  kernels  K.  Being  defined  only  in  terms  of  the  simulated  data  X  and  17,  T  can 
not  explicitly  modify  itself  to  reflect  knowledge  of  the  various  stationary  distributions. 
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Definition  2.2  Let  M  C  M.  We  say  that  a  random  time  T  is  a  .V-uniform  stationarity 
detection  time  if  T  is  a  stationarity  detection  time  for  each  K  €  N'. 

Perhaps  surprisingly,  it  is  often  possible  to  construct  such  detection  times: 

Example  2.2  Suppose  again  that  S  is  finite  or  countably  infinite.  Without  loss  of  generality, 
we  can  take  5  to  be  {0,1,...}.  Let  M\  be  the  class  of  irreducible  positive  recurrent  transition 
matrices  K  defined  on  S.  For  16  5,  let 


1  ” 

F{X,x)  =  liminf  —  H^k  <  ^)- 
”  ”  A:=0 

For  a  fixed  but  arbitrary  K  €  Afj,  the  strong  law  of  large  numbers  for  A'  implies  that  F{X,-) 
is  almost  surely  equal  to  the  distribution  function  of  ir^',  and  thus 

Ti  =inf{n  =  0,l...:X„  =  E-\A:,f/o)} 

coincides  a.s.  with  the  T  of  Example  2.1.  Thus  Ti  is  a  stationarity  detection  time  for  the  given 
K  and  hence  for  all  K  ^  Mi-  □ 

Of  course,  here  the  difficulty  is  that  T  is  constructed  after  observing  the  entire  (infinite) 
sample  trajectory  of  X.  Such  a  random  time  T  can  not  be  implemented  in  a  practical  setting. 
This  motivates  restricting  attention  to  stationarity  detection  times  which  cire  implementable  in 
the  following  sense: 

Definition  2.3  A  r.v.  Z“  is  implementable,  if  there  exists  an  a.s.  finite  randomized  stopping 
time  fi  such  that  Z'  is  a  deterministic  function  of  fi,{Xi,Ui),...,(X0,U0)  alone. 

We  are  now  ready  to  state  our  main  non-existence  result  for  strong  stationarity  times.  It 
proves  that  well-behaved  stationarity  detection  times  T  fail  to  exist  even  when  one  restricts 
attention  to  Markov  chains  with  countably  infinite  state  space.  In  fact,  let  Ad 2  be  the  class  of 
aperiodic  irreducible  positive  recurrent  transition  matrices  K. 

Theorem  2.1  Assume  that  S  is  countably  infinite.  Then  there  exists  no  implementable 
uniform  stationarity  time. 

In  fact,  an  even  stronger  result  (Theorem  2.2  below)  can  be  proved.  Recall  that  the  total 
variation  distance  between  probability  measures  p  and  1/  on  5  is  defined  by 

(Ip  -  i^II  =  2  sup  \n{B)  -  v{B)\. 

BCS 


Definition  2.4  Let  Af  C  Af. 

(a)  The  family  Af  is  said  to  be  a  weak  uniformity  class  for  the  initial  transient  problem,  if  for 
each  e  >  0,  there  exists  an  implementable  r.v.  Z*(c)  such  that 

||Pp.K(^*(0  6-)-^K(-)ll<€  (2.3) 

for  any  initial  distribution  fi  on  S  and  any  K  6  Af. 

(b)  The  family  Af  is  said  to  be  a  uniformity  class  if  there  exists  an  implementable  r.v.  Z*  such 
that  IP^_^-(Z*  €  •)  =  ’'■A'(-)  for  any  initial  distribution  p  on  S  and  any  K  €  A^ 
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We  call  the  r.v.’s  Z*(6)  and  Z*  appearing  in  Definition  2.4  an  f-stationary  r.v.  and  stationary 
r.v.,  respectively.  If  Z*{f)  can  be  represented  as  for  some  random  time  T*'*,  we  call 

an  €-stationarity  time. 

Note  that  (a)  demands  only  that  the  marginal  distribution  of  Z*(f)  be  approximately 
stationary.  The  extension  from  stationarity  detection  times  to  stationary  random  variables 
is  motivated  by  the  fact  that  given  a  stationary  detection  r.v.  Z*,  one  can  simulate  a  strictly 
stationary  version  of  the  Markov  chain  by  starting  from  A'o  =  Z’,  and  given  a  e-stationary  r.v. 
Z’(f),  the  version  of  the  Markov  chain  started  from  A'o  =  ^’(f)  satisfies 

||P^.A-(A'n  €  •)  -  ^k(-)II  <  f- 

Theorem  2.2  Assume  that  S  is  countably  infinite.  Then  M2  is  not  a  weak  uniformity  class. 
This  follows  from  the  following  result: 

Proposition  2.3  Let  S  =  {0,1,2,...},  let  =  (fc-°htj€S  be  a  given  transition  matrix  in 
M,  and  let  be  the  set  of  transition  matrices  K  =  (ki}),jes  €  M  such  that  there  exists 

an  integer  A  =  such  that  fc.y  =  k\f  for  i,j  <  A.  Then  is  not  a  (-uniformity 

class  for  0  <  e  <  2. 

These  results  suggest  that  the  class  J\f  of  transition  kernels  needs  to  be  carefully  chosen  in 
order  that  one  have  any  chance  of  being  able  to  construct  a  uniformly  well  behaved  stationarity 
detection  rule.  The  remainder  of  this  paper  is  concerned  with  describing  the  type  of  information 
that  needs  to  be  present  in  A/”  so  as  to  permit  such  constructions. 


3  Simulation  of  stationary  finite  Markov  chains 

Our  main  result  on  finite  Markov  processes  is  the  following: 

Theorem  3.1  The  class  of  irreducible  Markov  chains  with  a  fixed  number  s  of  states  is  a 
uniformity  class. 

Thus,  there  exists  algorithms  generating  a  r.v.  having  the  stationary  distribution  rr  of  a  finite 
Markov  chain  using  only  simulated  values  and  randomization.  We  proceed  to  describe  one  such 
algorithm,  thereby  providing  a  proof  of  Theorem  3.1. 

The  first  step  is  to  translate  the  problem  into  a  problem  on  continuous-time  Markov  process 
by  Poissonification.  Indeed,  it  is  a  standard  fact  that  if  Vj,  V2t  •  •  •  are  i.i.d.  exponential 
(say  with  unit  rate),  then  the  process  {T(t)}  defined  by 

Y(t)  =  Xq,  0  <  t  <  Vj,  Y{t)  =  X„,  V\  +  •  •  ■  +  Vn-i  <  t  <  Vi  +  ■  •  ■  +  V n 


is  a  s-state  irreducible  Markov  process  with  the  same  stationary  distribution  as  {A„}.  Note 

that  {T(0}  does  not  necessarily  jump  at  V'l  + - h  Ki  but  only  if  A'„+i  ^  A'„.  In  terms  of  the 

transition  probabilities  ik,j)ij^E  for  {A'„},  {>"(<)}  has  intensity  A,j  =  kjj  for  jumping  from  i  to 
j  when  i  ^  j. 

The  next  step  is  to  observe  that  the  construction  of  a  stationary  detection  r.v.  for  {T(<)}  is 
easy  when  s  =  2,  e.g.  E  =  {1,2}  where 

—  *^21  _  -^12 
*  Ai2  +  A21  ’  *  Ai2  +  A21 

Indeed,  let  T,  be  the  first  holding  time  of  state  i,  i  =  1,2.  Then  Tj,  T2  are  exponential  with 
intensities  A12,  resp.  A21,  and  an  easy  calculation  shows  that 
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Thus,  we  have  the  following  algorithm; 

Algorithm  A  (Generating  a  Stationary  Detection  R.V.  Z  for  a  Markov  Chain 
{A'n}  With  Two  States  1,2  ) 

1.  Let  j  «—  1,  <  0 

2.  Generate  V ,  an  exponential  random  variable  with  unit  intensity,  and  A'l  starting  from 
Ao  —  1*  Let  j  < —  Ai,  t  ^ —  t  'V 

3.  Ifj  =  2,  let  Ti  ^  t; 
else  return  to  2 

4-  Repeat  steps  1,  2,  3  with  states  I  and  2  interchanged  to  generate  T2 

5.  IfTi  >  T2,  let  Z  ^  1; 
else  Z  <—  2 


The  last  step  is  to  treat  the  case  s  >  2  recursively  and  is  the  most  intricate.  We  shall  need  to 
introduce  the  /’-valued  process  defined  as  the  process  {y(0}  oi'  where  F  C  S 

(see,  e.g.,  [4]  pp.  13-14).  This  means  that  in  the  path  of  {1^(0)  we  delete  all  segments  where 
Y{t)  ^  F  and  glue  together  the  remaining  segments.  Algorithmically,  this  can  be  implemented 
as  follows: 

Algorithm  B  (Generating  the  First  Holding  Time  T^^\i)  And  the  Next  State 
y'(^'(i)  OF  |y<^)(t)|  Starting  From  y<^H0)  =  A'o  =  i  €  F) 

1.  Let  j  *—  i,  t  *—  0 

2.  Generate  V,  an  exponential  random  variable  with  unit  intensity.  If  j  6  F,  let  t  t  +  V . 

3.  Generate  Xi  starting  from  Xo  =  j-  Let  j  *-  A'l. 

4.  If  j  ^  i  and  j  £  F,  let  T^^^i)  ^  t,  *—  j; 

else  return  to  2 


It  is  well  known  that  the  stationary  distribution  of  |y^^^(t)|  is  obtained  by  conditioning 
TT  =  to  F: 

r(n  _ 

ttf 


X;  '  =  — -  where  Xf  =  ^  Xj. 


(3.5) 


}€F 


Also,  we  have  the  principle  of  local  balance:  when  |y^^'*'^^(t)|  is  stationarity,  the  rate  of  flow 
of  mass  from  F  to  G  is  the  same  as  the  flow  of  mass  from  G  to  F  (here  F,  G  are  disjoint  subsets 
of  5).  In  terms  of  the  intensities  for  means  that 


E(F+G).{F+G)  _  Y-  {F+G).{F+G) 

^jF  ' 

i6F  j€G 


(3.6) 


where  =  YIj^g  rewrite  (3.6)  as 

^F  2-  ^tG  -^G  2^  ^jF 

ieF  j^G 


(3.7) 
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Now  assume  that  we  can  generate  exponential  r.v.’s  having  parameters 

52ieF r®®?-  5ZjeG stationarity  detection  r.v.’s  having 

distributions  resp.  Then  as  in  (3.4), 


ip('7’(^+f')  V.  _  _ ^«gF  ^iG _ 

1  +  E,€0  »f E.6F 

_  _ ]: _ _ 

"  1 + ^  ’ 

where  we  have  used  (3.7)  for  the  third  equality.  That  is,  we  can  use  the  ordering  of 

to  decide  whether  should  be  in  F  or  G,  and  next  Z^^^  or  to  decide  which 

value  in  F,  G  is  the  appropriate.  Thus 

IP(2(/=’+G)  ^  j)  ^  iP(t},^+^)  >  T^^+‘^))P(Z<^>  =  i)  = 

for  i  6  f  as  desired,  and  similarly  IP(Z^^+^1  =  j)  =  for  j  £  G. 

To  construct  note  that  the  distribution  of  is  that  of  the  minimum  of 

exponential  r.v’s  Wi  with  intensity  for  the  ith  state  (i  £  F).  If  >  0,  we 

can  sample  a  Poisson  stream  with  intensity  by  repeatedly  starting  |y^^"'’‘^)(t)|  in  state 

i  and  accumulate  the  time  until  a  transition  to  G  occurs.  We  can  then  thin  this  stream  by 
generating  a  sequence  of  i.i.d.  copies  of  Z^^K  say  Z{^KZ2^\ —  The  nth  point  is  retained  if 
Zn^^  =  t,  thereby  obtaining  a  Poisson  stream  with  intensity  The  r.v.  W,  is  then 

the  first  epoch  of  the  thinned  Poisson  process.  In  practice,  this  construction  requires  a  small 
modification  since  may  be  zero  for  some  i.  However,  by  irreducibility  >  0  for 

at  least  one  i  £  F,  and  this  ensures  that  the  following  algorithm  is  valid  (here  ti  indicates  the 
amount  of  time  in  which  the  ith  Poisson  stream  has  been  simulated  so  far  and  s,  is  a  binary 
variable  indicating  whether  it  is  necessary  to  simulate  any  further): 

Algorithm  C  (Generating  if  it  is  Known  How  to  Generate  Z^^^  Z^^^) 

1.  Number  the  states  in  F  in  some  way,  say  F  =  {^oi  •  •  • » ^p-i } 

2.  Let  Tp  ■>— oo,  ti  ^  0,  Si  0,  i  =  0,..  .p  -  I 

3.  Let  i  *-  p  -  I 

4-  If  all  Sk  =  1,  go  to  10; 
else  t  <—  ( i  +  1 )  mod  p 


5.  If  Si  =  I,  return  to  4 

6.  Generate  r(^+*^>(^,),  T^^+^^li)  using  Algorithm  B  and  let  j 

t,  -  +  7’<^+^)(f.). 

IfTi  >  Tp,  let  Si  =  1  and  return  to  4 

7.  If  j  £  G,  generate  Z^^^  and  let  k  «—  Z^^^■ 
else  return  to  4 

8.  If  j  ^  k,  return  to  4; 

else  if  ti  <  let  Tp  <—  ti 


y(f+G)(^,), 
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9.  Return  to  4 

10.  Repeat  steps  7,. . .  ,9  with  F  and  G  interchanged  to  generate  To 

11.  IfTp  >  Tg,  generate  and  let  Z(^+^)  Z<'"); 

else  generate  and  let  »— 

To  generate  Z  =  Z^^\  one  can  then  start  by  noting  that  obviously  Z^^l  =  z  if  F  =  {«}  is 
a  one-point  set.  Using  Algorithm  C,  one  can  then  generate  Z*^'  for  all  two-point  sets;  then 
Z*^'  for  all  three-  or  four  point  sets  and  so  on.  Algorithmically,  this  is  particular  convenient  in 
programming  languages  (e.g.  Pascal)  allowing  recursive  procedure  calls. 

In  Section  6,  we  give  some  estimates  indicating  how  the  number  of  steps  needed  to  generate 
Z  may  depend  on  the  number  s  of  states. 


4  Simulation  of  stationary  regenerative  processes 

Let  5  be  a  state  space  endowed  with  a  metric  under  which  S  is  separable  (e.g.  IR"^).  \i  X  = 
{A'(<)}j>q  is  a  right-continuous  stochastic  process  taking  values  in  5,  we  say  that  AT  is  a  (non- 
delayed)  wide-sense  regenerative  process  if  there  exist  random  times  0  =  TIO)  <  T';  1)  <  . . .  such 
that 

i)  X(T(n)-|-  ■)  =  X(-)  for  n  >  1; 

ii)  T{n)  is  independent  of  X{T(n)  -f  •)  for  n  >  1. 

We  note  that  we  are  not  requiring  the  process  evolution  prior  to  time  T(n)  to  be  independent 
of  that  subsequent  to  r(n).  Instead,  the  post-T(n)  process  X{T{n)  +  •)  =  {^{Tin)  -|-  Oloo 
required  only  to  be  independent  of  the  time  T{n)  itself.  This  extension  of  classical  regeneration 
(known  as  wide-sense  regeneration)  turns  out  to  be  useful  in  the  study  of  Harris  recurrent 
Markov  chains;  see  pp.  150-158  of  [4].  [We  note  that  a  discrete  time  sequence 
be  analyzed  by  studying  the  associated  continuous  time  process  {A'(t)}j>o,  where  A'(0  =  A'j^^j 
and  [tj  denotes  the  greatest  integer  less  than  or  equal  to  T.j 

We  say  that  AT*  =  {.X^*(0}t>o  ^  stationary  version  of  AT  if  AC*  is  a  strictly  stationary 

process  possessing  an  associated  random  time  T*(0)  such  that 

i)  X%T'{0)F-)  =  X{-) 


ii)  r*(0)  is  independent  of  AC*(T*(0)  -t-  •). 

In  the  remainder  of  this  section,  we  will  describe  two  settings  in  which  AC*  can  be  simulated, 
given  the  ability  to  simulate  X. 

Let  Tn  =  T(n)  -  T(n  -  1)  for  n  >  1,  write  r  =  rj  for  the  generic  cycle  and  set  m  =  Et.  The 
following  description  of  the  stationary  version  X‘  is  given  in  [25]: 


Proposition  4.1  Assume  m  <  oo.  Suppose  that  x'  = 
process  with  associated  random  time  T'{0)  satisfying 


t>o 


is  an  S-valued  stochastic 


E(T'(0)  €  dx)  =  —]P(T  e  dx) 


(4.8) 


and 

E(X'e-|T'(0)  =  i)  =  E(ACe-|T  =  a:)  (4.9) 

for  each  i  >  0.  If  U  is  a  uniformly  distributed  r.v.  on  [0, 1]  and  independent  of  X  ,  then  AC*  is 
a  stationary  version  of  X  where 

X'{t)  =  X'{UT'{0)  +  t),  t>0. 
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There  is  an  intuitive  explanation  for  why  this  construction  should  give  a  stationary  version. 
Imagine  that  the  process  X  has  been  running  for  a  time  interval  of  length  t  and  that  we  pick 
a  point  T]  uniformly  in  the  interval  [0,t].  Then,  the  post-;;  process  X(t]  +  •)  converges  to  a 
stationary  version  X*  of  X  when  t  oo.  The  possibility  of  the  point  t]  ending  up  in  a  given 
cycle  interval  of  length  x  should  be  proportional  to  x  and  the  relative  number  of  such  intervals 
is  P(r  €  dx),  which  (together  with  the  ‘normalisation’  /  xP(r  G  dx)  =  m)  gives  us  (4.8).  Given 
the  length  of  the  picked  cycle,  it  should  behave  as  an  ordinary  cycle,  i.e.  (4.9)  should  hold. 
Finally,  the  picked  point  should  lie  uniformly  within  its  ‘length-biased’  interval,  independently 
of  everything  else. 

Suppose  that  r  has  a  density.  In  that  case,  (4.8)  states  that  the  ratio  of  the  density  of  T  (0) 
to  that  of  T  is  x/m.  Hence,  if  the  r.v.  r  is  a.s.  bounded  above  by  the  deterministic  finite  constant 
a,  say,  the  ratio  will  be  bounded  above  by  a/m.  It  is  well  known  (see,  for  example,  [19])  that 
the  boundedness  of  the  ratio  permits  one  to  generate  the  r.v.  T  (0)  via  acceptance-rejection 
(given  an  algorithm  to  generate  ordinary  regenerative  cycle  lengths  with  the  distribution  of  r). 
A  similar  analysis  is  valid  without  assuming  the  existence  of  denities,  in  particular  when  r  is  a 
discrete  r.v. 

Combining  this  acceptance-rejection  idea  and  Proposition  3.1  leads  to  the  following  algorithm 
for  generating  the  stationary  version  X"  corresponding  to  X : 

Algorithm  D  (Generating  a  Stationary  Version  X’  When  the  Cycle  Lengths  Are 
Bounded,  r  <  a  <  oo  a.s.) 

1  n  ♦—  1 

2  Generate  X  over  [T{n  -  l),T(n)) 

3  Generate  Vn,  a  uniform  r.v.  on  [0,1] 

4  If  aVn  <  r„,  go  to  5; 

else  n  ♦-  n  +  1  and  return  to  2 

5  Generate  U,  a  uniform  r.v.  on  [0,1],  and  let 

X'it)  =  X{T{n-l)  +  UTn  +  t),  t>0. 

We  note  that  r*(0)  =  (1  -  U)t„  and  that  the  probability  of  acceptance  at  step  4  of  the  above 
algorithm  is  m/a;  the  expected  number  of  times  that  the  test  at  step  4  is  executed  is  therefore 
a/m. 

Algorithm  A  implies  that  the  class  of  wide-sense  regenerative  processes  with  cycle  lengths 
bounded  above  by  a  fixed  constant  form  a  uniformity  class.  Because  of  the  intimate  relation 
between  regenerative  processes  and  recurrent  Markov  chains,  it  is  also  clear  how  to  translate 
this  into  a  result  about  Markov  chains. 

Unfortunately  it  is  only  rarely  the  case  that  the  cycle  lengths  of  a  regenerative  process  are 
bounded  (but  see  Example  7.2  for  an  interesting  exception).  However,  [25]  provides  us  with 
the  tools  necessary  to  develop  a  second  interesting  class  of  wide-sense  regenerative  processes  for 
which  generation  of  the  stationary  version  X"  is  possible. 

Proposition  4.2  Assume  m  <  oo.  Then  X"  is  a  stationary  version  of  X  if  there  exists  an 
associated  random  time  (i  such  that 

IP(0  e  dx)  = —JP{T  >  x)  (4.10) 

m 

and 

P(X'  €  -1/3  =  t)  =  PlX(t  +  ■)  e  fl|r,  >  t)  (4.11) 

for  each  x,t  >  0. 
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We  note  that  (4.10)  states  that  (3  has  the  stationary  age  distribution  (or  the  stationary 
excess  life  distribution)  for  the  wide-sense  regenerative  process  X\  see  p.  116  of  [4].  Thus,  (4.11) 
basically  asserts  that  the  stationary  version  X'  can  be  obtained  by  conditioning  the  original 
process  X  in  such  a  way  that  the  cycle  currently  in  progress  has  the  appropriate  stationary  age 
distribution. 

The  key  to  applying  Proposition  3.2  to  simulate  X"  is  the  ability  to  generate  the  r.v.  (i  from 
the  distribution  specified  by  (4.10).  We  say  that  the  stationary  age  distribution  is  simulatable  if 
such  variate  generation  is  possible.  Proposition  3.2  immediately  establishes  the  validity  of  the 
following  algorithm: 

Algorithm  E  (Generating  a  Stationary  Version  X'  When  the  Stationary  Age 
Distribution  is  Simulatable) 

1  Generate  a  r.v.  0  with  distribution  (4-10) 

2  71  —  1 

3  Generate  X  over  [^(ti  -  1),7’(7i)) 

4  //r„  >  (3,  go  to  4; 

else  Ti  «—  71  +  1  and  return  to  2 

5  Let 

X’it)  =  XiT{n- 1)4-13 +  t),  t>0. 

We  note  that  r*(0)  —  Tn  -  (3.  If  /3  =  z,  the  probability  of  acceptance  in  step  4  given  (3  ~  x 
is  IP(r  >  z),  so  that  the  expected  number  of  times  N  the  test  at  step  4  is  performed  is 

1  /■'="  1  1  1 
F(r  >  z)  ’  Jo  P(r  >  z)  m  '  ’  Jo  m 

(unless  the  support  of  r  is  bounded,  say  a  is  the  supremum;  then  we  get  a/m  precisely  as  in 
Algorithm  D).  Thus  typically  Algorithm  E  has  an  infinite  expected  sample  size.  This  indicates 
that  applications  that  require  repeated  use  of  Algorithm  E  (see,  for  example,  Section  6)  will 
in  practice  have  enormous  sample  sizes,  whereas  the  problem  is  less  serious  if  the  algorithm  is 
only  used  once,  for  example  when  starting  a  long  simulation  run  is  a  strictly  stationary  way  to 
eliminate  bias.  [It  is  tempting  to  circumvent  the  problem  by  generating  (3  in  step  3  instead,  but 
this  idea  does  not  lead  to  the  correct  distribution  of  X‘.] 

It  may  be  noted  that  the  r.v.  N  does  not  represent  an  extreme  instance  of  a  r.v.  with  a  heavy¬ 
tailed  distribution.  Suppose  for  example  that  r  has  a  geometric  distribution  (see  Examples  7.4, 
7.5  below)  or,  more  generaUy,  that  P(r  >  z)  <  for  some  a  >  0.  By  Jensen’s  inequality, 

JE[N^\I3  =  z]  <  W[N\(3  =  z]  =  (P(r  >  z)-p 

for  p  <  1  and  hence 

yoo  1  yoo  1  yoo 

EiVP  =  /  EliVn/?  =  x]JP(0  €  dz)  <  —  /  (P(r  >  z)*-Pdz  <  —  /  e-<*-P>"^dz  <  00. 

Jo  Tn  Jo  rn  Jo 

Also,  one  may  note  that  once  /?  =  z  has  been  picked  in  step  1,  the  conditional  expected  sample 
size  1/P(t  >  z)  is  finite. 

One  might  initially  expect  that  the  only  case  when  the  stationary  age  distribution  is  simulatable 
is  that  where  the  stationary  distribution  of  X  is  known  in  closed  form,  in  which  case  one  can 
simulate  the  stationary  version  X”  from  this  distribution  explicitly.  This,  however,  is  not  the 
case;  see  Examples  7.3  and  7.4  for  non-trivial  applications  of  Algorithm  E. 
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Algorithm  B  implies  that  the  class  of  wide-sense  regenerative  processes  with  a  given  simulatable 
stationary  age  distribution  is  a  uniformity  class. 

It  turns  out  that  Algorithms  D  and  E  can,  in  fact,  be  extended  beyond  the  setting  of  wide- 
sense  regenerative  processes.  Variants  of  these  algorithms  can  be  developed  for  synchronous 
processes;  see  [26]  for  details.  A  synchronous  process  is  one  in  which  the  ‘cycles’  form  a  stationary 
sequence,  but  where  the  dependency  structure  between  cycles  can  essentially  be  arbitrary.  These 
processes  arise  naturally  as  Palm  versions  of  stationary  point  processes;  see  [22].  Because  it  is 
not  clear  to  the  authors  how  this  would  show  up  in  a  simulation  context,  we  don’t  discuss  this 
further  here. 

5  Weak  uniformity  classes 

We  start  by  showing  that  for  a  certain  class  of  Markov  chains,  one  can  calculate  a  priori 
estimates  on  the  rate  at  which  the  system  converges  to  steady-state,  and  thereby  construct  - 
stationarity  detection  times  which  are  deterministic.  In  the  finite  state  space  setting,  estimating 
the  convergence  rate  essentially  amounts  to  calculating  a  bound  on  the  eigenvalue  of  K  having 
the  second  largest  modulus,  since  this  is  the  parameter  that  determines  the  rate  at  which  the 
nth  power  of  an  irreducible  transition  matrix  converges  to  its  limit.  In  any  case,  given  an  upper 
bound  on  the  rate  at  which  the  system  converges  to  steady-state,  we  can  choose  a  time  T 
so  that  the  total  variation  distance  to  the  steady-state  distribution  is  arbitrarily  small.  The 
deterministic  time  T  can  then  be  used  in  (2.3)  to  obtain  an  appropriate  uniform  bound  on  the 
total  variation  distance. 

We  say  that  a  transition  kernel  K  satisfies  a  (A,(p,m)  minorization  if 

x€S.  (5.1) 

Here  0  <  A  <  1,  v?  is  a  probability  distribution  on  5  and  m  >  1  an  integer  (A'”*(x,19)  denotes 
the  probability  that  the  chain  X  moves  from  x  to  .0  in  m  transitions).  Some  discussion  of 
condition  (5.1)  is  given  in  Remark  4.1  below. 

The  following  result  is  well-known  and  straightforward  to  show  via  coupling  (see,  e.g.  [20]): 
Proposition  5.1  If  K  satisfies  a  (A,<p,m)  minorization,  then  K  G  M  and 

sup  ||A'”(x,.)  -  7rA-(-)||  <  (1  -  A)l"/-J.  (5.2) 

i^S 

It  follows  that  by  choosing  n  sufficiently  large,  we  can  make  (1  -  A)!^"/’"-!  arbitrarily  small.  This 
immediately  proves  that  one  can  construct  deterministic  e-stationarity  detection  times  for  the 
above  class  of  systems. 

Theorem  5.1  Fix  A  >  0  and  m  >  1,  and  let  M3  be  the  family  of  transition  kernels  defined  on 

S  such  that  K  satisfies  a  {X,(p,m)  minorization  for  some  y>.  Then  M3  is  a  weak  uniformity 

class,  and  T{e)  is  an  e-stationarity  detection  time  when  T{e)  is  chosen  as  the  least  integer  n 
having  the  property  (1  —  A)-”^”*-l  <  e. 

We  will  see  later  (Example  7.3)  that  M3  is,  in  fact,  a  uniformity  class.  In  order  to  achieve  strict 
stationarity  (rather  than  f-stationarity),  we  will  use  randomized  stopping  times  (rather  than 
the  deterministic  times  of  this  section). 

Remark  5.1  Condition  (5.1)  is  closely  related  to  the  Doeblin  condition  studied,  for  example, 
in  Chapter  5  of  [10]  (in  fact,  it  is  easy  to  show  that  condition  Do  stated  on  p.  221  of  [10] 
implies  (5.1)).  In  the  discrete  state  space  setting,  (5.1)  is  equivalent  to  requiring  that  the  m- 
step  transition  matrix  has  a  column  in  which  the  elements  are  uniformly  bounded  away  from 
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zero  (in  the  case  of  a  finite  5,  (5.1)  holds  for  some  m)  if  and  only  the  Markov  chain  is 

aperiodic  and  irreducible).  Typically,  we  would  expect  (5.1)  to  hold  when  S  is  compact  and 
I\(x,A)  satisfies  some  continuity  requirements  [in  fact,  in  (5.1)  we  may  preliminarily  for  a  fixed 
X  take  m  =  m(x)  where  there  is  positive  probability  of  coupling  to  the  stationary  version  in 
m  steps  starting  from  x,  cf.  Lemma  2.2  of  [.5];  by  continuity,  the  same  m  will  then  serve  in  a 
neighbourhood  of  x,  and  by  compactness,  a  finite  m  will  do  for  all  a:]. 

It  turns  out,  in  fact,  that  the  class  of  chains  for  which  deterministic  detection  times  work 
uniformly  well  over  all  possible  initial  distributions  ix  is  precisely  described  by  the  set  of  kernels 
K  satisfying  (5.1).  This  follows  by  the  following  partial  converse  to  Theorem  4.1:  suppose  that 
for  0  <  (  <  Ij'l  there  exists  a  deterministic  time  T(f)  =  n  such  that  ||Px.A'  (A'„  €  •)-’rA  (-)ll  <  c 
for  each  x  E  S.  Then,  there  exists  (A,</),m)  such  that  K  satisfies  a  (A,(p,m)  minorization.  (The 
proof  is  easy  and  therefore  omitted).  O 


One  difficulty  with  applying  Algorithm  D  of  Section  3  is  that  it  requires  that  the  cycle 
lengths  be  bounded.  Very  few  regenerative  processes  have  this  property,  although  the  class  of 
(s,  5)  inventory  systems  is  an  notable  exception  (see  Example  7.2  for  further  details).  It  is  worth 
noting  that,  in  general,  boundedness  of  the  regeneration  times  does  not  imply  the  existence  of 
a  deterministic  stationarity  detection  time  T ;  see  [15]  for  a  discussion  of  the  class  of  chains  for 
which  such  deterministic  times  exist. 

On  the  other  hand,  one  might  hope  that  the  application  of  an  appropriately  derived  truncation 
technique  to  the  cycle  length  distribution  would  enable  one  to  use  Algorithm  AE  to  construct  e- 
stationarity  detection  times.  The  development  of  such  a  methodology’  is  given  in  the  Appendix, 
where  we  show  the  validity  of  the  following  algorithm; 

Algorithm  F  (Generating  an  f-STAXioNARiTy  Detection  Time  T(()  When  an  Upper 
Bound  7  on  IErP+*  is  Known) 

1  Calculate  a  =  a(€)  =  (47/c^)*/p 

2  n  ^  1 

3  Generate  X  over  {T{n  —  l),T(7i)) 

4  //  Tn  <  a,  go  to  5; 

else,  n  n  +  1  and  return  to  3 

5  Generate  a  uniform  r.v.  on  [0,1] 

6  If  aVji  <  Tn,  go  to  7; 

else,  n  —  n  +  I  and  return  to  3 

7  Generate  U ,  a  uniform  r.v.  on  [0, 1],  and  let  T(()  =  T{n  -  1)  + 

Thus,  the  class  of  regenerative  processes  with  JErP"*"^  <  c  is  a  weak  uniformity  class 


6  Further  discussion 

An  underlying  theme  of  Sections  2  through  5  is  that  the  tail  of  the  distribution  of  the  cycle 
length  r  plays  a  critical  role  in  whether  one  can  construct  c-stationarity  times.  In  particular, 
we  have  positive  results  for: 

i)  finite-state  Markov  chains  (stationarity  detection  r.v.'s  are  constructed  in  Algorithms  A, 
B,  C) 
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ii)  wide-sense  regenerative  processes  with  bounded  cycle  lengths  (Algorithm  D  constructs 
stationarity  detection  times) 

iii)  wide-sense  regenerative  processes  with  simulatable  stationary  age  distribution  (Algorithm 
E  constructs  stationarity  detection  times) 

iv)  wide-sense  regenerative  processes  with  bounded  [p  +  l)th  moment  of  the  cycle  length 
(Algorithm  F  constructs  c-stationarity  detection  times) 


On  the  other  hand,  our  principal  negative  result  (Theorem  2.2)  arises  in  a  setting  in  which  no 
control  whatsoever  is  placed  on  the  behavior  of  the  cycle  length  distribution. 

The  critical  role  of  the  tail  behaviour  of  the  cycle  length  distribution  in  developing  initial 
transient  detection  algorithms  comes  perhaps  as  no  surprise,  given  the  intimate  relationship 
between  this  tail  behavior  and  rates  of  convergence  for  regenerative  processes  (see,  for  example, 
[24]). 

One  obvious  question  that  is  of  interest  to  the  simulator  is  whether  the  ability  to  generate 
a  stationary  version  of  the  process  can  be  used  to  obtain  a  variance  reduction  in  the  context  of 
steady-state  simulation.  In  particular,  suppose  that  X  =  {A'(<)}i>o  ^  real-valued  regenerative 

process  in  the  classical  sense  (with  i.i.d.  cycles)  for  which  IE|yj^  +  r^|  <  oo,  where  F,  = 
J^^-}_y)\X{s)\ds-,  to  avoid  trivial  cases,  assume  that  X  is  not  constant  (A'(t)  =  a).  Letting 
Q'i(<)  denote  the  time  average  (sample  mean)  Jq  X(s)ds,  it  is  well  known  that 

ai(i)  a  =  EA'*(0)  =  — EFi  =  — E  f  X(s)ds. 

m  Tn  Jo 

The  conventional  approach  for  estimating  the  steady-state  mean  a  is  to  use  oi{t)  as  point 
estimator,  and  under  the  conditions  stated. 


V^(ai(t)  -  a)  —  cTi  N(0, 1),  t oo. 


(6.3) 


where  ctj  =  ETj^/m. 

However,  the  ability  to  simulate  a  stationary  version  of  the  regenerative  process  suggests 
the  following  alternative  estimator  Q2(<)-  Assume  that  the  cycle  lengths  are  bounded  and  that 
algorithm  D  is  used  to  find  a  stationarity  detection  time  Ti(0);  then  EA'(Ti(0))  =  EA'*(0)  =  q. 
Let  Ai  =  A'(Ti(0)),  proceed  to  the  next  cycle  and  execute  Algorithm  D  a  second  time  to  produce 
a  second  independent  copy  A2  of  Ai.  Going  on  in  this  way,  the  simulation  of  X  over  [0.  <]  will 
produce  a  random  number  x(f)  of  i.i.d.  copies  Ai,  A2, . . . ,  of  A'xi(o)^  3-nd  we  can  therefore 
let 

,  x(<) 


Q2(0  =  < 


0 


x(t)  =  o 


Under  our  moment  assumptions,  it  follows  that  (see  [16])  O2(0  ^  «  sind 


v/F(Q2(<)-a)^  <72N(0,1),  f^oo,  (6.4) 

where  =  VarA'xj(o)  •  Er^,  and  is  the  amount  of  time  for  which  X  needs  to  be  simulated 
in  order  that  a  copy  of  A’jjjo)  be  calculated  (r*  is  the  sum  of  the  cycle  lengths  up  to  the  first 
accepted  cycle;  the  notation  indicates  that  may  be  thought  of  as  a  regeneration  point,  cf. 
the  proof  of  Proposition  6.2  below).  The  following  result  is  shown  in  the  Appendix: 


Proposition  6.1  For  any  regenerative  process  satisfying  the  conditions  of  Algorithm  D.  one 
has  (Tj  >  (xj. 
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Thus,  one  never  obtains  an  efficiency  improvement  by  favoring  Q2(0  over  Qi(<).  This  may  not 
appear  surprising,  as  a2{t)  trhrows  away  a  great  deal  of  information  that  is  incorporated  in 
Qi(<).  Nevertheless,  a  more  sophisticated  idea  will  indeed  produce  variance  reduction: 

Proposition  6.2  Under  the  assumptions  of  Algorithm  D,  there  exists  a  constant  b  (depending 
on  X )  such  that  the  estimator  03(1)  =  (1  -  b)ai{t)  +  ba2{t)  satisfies 

\/t(a3(0  -  a)  —  CT3N(0,  1),  t  —  00,  (6.5) 

with  (T3  <  <7j.  Furthermore,  b  can  be  consistently  estimated.  I.e.,  there  exists  6’(t)  such  that 
b''{t)  can  be  evaluated  from  the  simulation  in  [0,  t]  and  b’{t)  6,  /  — ►  00,  and  then  (6.5)  holds 

with  03(0  replaced  by  (1  -  6*(f))ai(f)  +  6*(<)a2(t). 

The  evaluation  of  6  and  the  construction  of  b’{t)  is  given  in  the  proof  (based  upon  a  slightly 
tricky  application  of  linear  control  variates)  in  the  Appendix.  It  will  also  be  seen  that  unless  one 
has  a  process  with  a  very  special  dependence  structure,  the  strict  inequality  holds.  The 

construction  is,  however,  somewhat  complicated  and  also  it  is  the  feeling  of  the  authors  that  the 
variance  reduction  which  can  be  obtained  in  this  way  will  seldom  be  very  substantial.  Further, 
even  though  the  control  variate  idea  carries  over  to  Algorithm  E  when  simulating  a  fixed  number 
of  cycles  of  generic  length  r*,  the  corresponding  estimator  can  never  compete  with  ai(t)  when 
we  discuss  efficiency  in  terms  of  the  simulation  run  length  t,  because  Er*  =  00. 

As  a  consequence,  we  do  not  believe  that  the  main  simulation  contribution  of  this  paper 
lies  in  the  area  of  variance  reduction.  Rather,  the  focus  is  on  the  initial  transient  problem. 
The  idea  is  to  produce  estimators  with  lower  bias,  without  adversely  affecting  the  asymptotic 
variance  or  the  length  of  the  simulation.  Our  results  contribute  to  this  by  giving  insight  into 
how  to  construct  a  random  time  r(e)  such  that  the  post-T(e)  process  is  close  to  stationarity. 
The  estimator 

04(0  =  )  [\x{T{e)  +  s)ds 
t  Jo 

will  typically  have  better  bias  characteristics  than  tti(<)'  without  affecting  the  asymptotic 
variance  (since  \/i(a4{t)  —  a)  ^  aiN(0, 1)  as  f  — ►  00)  or  the  order  of  magnitude  of  the  length  of 
the  simulation  for  large  t. 

A  second  important  possibility  that  the  results  of  this  paper  create  is  the  ability  to  use 
simulation  to  numerically  calculate  upper  bounds  on  the  rate  of  convergence  of  a  stochastic 
system  to  its  steady-state.  Such  upper  bounds  are  currently  of  great  interest  to  the  probability 
community;  see,  for  example,  [3].  As  stated  earlier,  in  the  finite  state  Markov  chain  setting 
this  is  tantamount  to  using  simulation  to  numerically  calculate  an  upper  bound  on  the  second 
eigenvalue  A  of  the  transition  matrix  of  the  chain.  In  this  case,  A  may  often  be  available  by  other 
means,  but  for  even  slightly  more  complicated  processes  the  difficulties  are  formidable  (e.g.  for 
queues  this  convergence  rate  is  related  to  the  concept  of  relaxation  time,  see  [4]  Ch.  III.  10  for 
an  example).  What  we  can  do  is  to  use  the  method  of  coupling  (see,  e.g.,  [4],  Ch.  VI. 2,  for 
some  basic  discussion  and  [20]  for  a  more  comprehensive  treatment)  to  numerically  calculate 
the  rate  of  convergence.  The  idea  is  to  simultaneously  simulate  both  a  stationary  version  X" 
and  the  non-stationary  version  X  of  the  process  (the  techniques  of  this  paper  would  be  used 
to  generate  AT*),  in  such  a  way  that  the  coupling  time  is  finite;  by  the  coupling  time  we  mean 
a  random  time  k  with  the  property  that  X{t)  =  X*{t),  t  >  k.  For  a  positive  recurrent  Markov 
chain  with  a  finite  or  countably  infinite  state  space  S  we  may  start  by  simulating  X'  and  X 
independently,  take  k  to  be  the  first  n  such  that  Xn  =  A'*  and  let  the  processes  be  identiccil 
after  k,  whereas  for  non-Markovian  processes  or  processes  with  a  continuous  component  of  the 
state  space  slightly  more  intricate  procedures  may  be  needed  (see,  e.g..  Example  6.1  below). 
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In  any  case,  the  tail  of  the  distribution  of  k  gives  an  estimate  of  an  upper  bound  on  the  total 
variation  distance  between  the  distributions  of  the  stationary  and  non-stationary  versions, 

||IP(A'(t)  €  ■)  -  IP(A'*(<)  €  Oii  <  2IP(k  >  t),  (6.6) 

and  by  simulating  i.i.d  replications  ki,...,ka?  of  k,  we  can  calculate  empirical  bounds  on 

P(k  >  t). 

The  ability  to  generate  a  stationary  version  of  the  process  can  have  additional  benefits,  as 
well.  For  example,  one  can  estimate  quantiles  of  the  stationary  distribution  by  generating  i.i.d. 
samples  of  the  process  in  stationarity  (in  much  the  same  way  as  O2(0t  is  constructed).  The 
generation  of  confidence  intervals  for  quantiles  in  the  i.i.d.  setting  is  less  compbcated  and  more 
straightforward  than  that  in  the  dependent  context,  although  it  is  likely  that  the  asymptotic 
variance  of  this  new  estimator  is  worse  than  that  of  the  traditional  one.  More  generally, 
traditional  steady-state  estimators  like  ai(t)  only  capture  features  of  the  one-dimensional  distribu¬ 
tion,  which  does  not  always  tell  the  whole  story;  see  the  discussion  in  [29]. 

7  Examples 

Example  7.1  We  shall  give  some  estimates  indicating  how  the  run  length  of  the  recursive 
Algorithm  C  may  depend  on  the  number  s  of  states  of  the  Markov  chain  {A'n}.  A  difficulty 
is  that  this  run  length  in  general  appears  to  depend  on  the  transition  probabilities  kij  in  a 
complicated  way,  and  that  it  thus  may  be  difficult  to  make  comparisons  for  different  values  of  s. 
We  shall  here  consider  an  extremely  simple  example,  a  chain  that  goes  to  any  other  state  with 
equal  probabilities  (thus  kij  =  l/(s  -  1)  for  i  ^  j,  ka  =  0).  By  the  ‘run  length’  we  understand 
the  expected  number  of  steps  of  {Xn}  that  need  to  be  generated  before  Z  =  is  observed. 

Consider  first  the  version  of  the  algorithm  where  one  state  at  a  time  is  added.  That  is,  in 
Algorithm  C,  F  is  a  one-point  set.  Let  rit  denote  the  run  length  needed  to  create  Z^^^  when  A 
has  t  elements.  Due  to  the  special  structure  of  the  kij,  rit  does  not  depend  on  A,  and  obviously 
Til  —  0,  n,  =  £3.  Now  let  G  have  t  elements  and  F  one,  say  i.  To  create  Algorithm  C  goes 
through  the  states  in  G  in  succesion,  creates  one  step  of  {Xn}  at  a  time  and  observes  whether 
the  next  value  is  i.  The  expected  amount  of  time  required  for  this  to  turn  out  succesfully  is 
s  —  1.  Then  Z^^^  is  generated,  and  the  algorithm  stops  strictly  later  than  at  the  time  when  the 
observed  value  of  is  the  state  from  which  a  transition  to  i  occured.  The  probability  of  this 
last  event  is  l/(<  -  1),  and  thus  rit+i  >  (s  —  1  +  nt){t  —  1),  from  which  it  follows  that  n2  >  5  -  F 
^3  >  ^2,  ^4  >  2713,  ris  >  3712,  and  thus 


4  =  >  (s  -  1)!  .  (■.") 

Now  assume  instead  that  the  recursive  step  is  carried  out  by  letting  F  and  G  be  of  the  same 
order  of  magnitude.  For  convenience,  let  s  =  2^  and  let  denote  the  run  length  needed  to 
create  when  A  has  2*  elements.  Assume  that  F  and  G  has  both  2^  elements.  An  upper 
bound  on  the  run  length  needed  to  create  is  2^  (the  number  of  states  in  G)  multiplied  by 
the  expected  number  of  steps  needed  to  create  an  event  in  the  Poisson  stream  with  intensity 
Zj^^ Arguing  similarly  as  above,  this  number  is 

oN  _  1 

Thus 

2^  —  1 

mk+\  <  2  •  2*(-^^  -I-  7n/t)2*  -I-  m^. 
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where  the  last  ruk  comes  from  step  11  of  Algorithm  C.  Letting  ml  =  max|mA:,(2^  -  1)/2*'|, 
•it  follows  that  mj  <  2^"^^,  and  thus 

£2^  =  m/M  <  ml;  <  ml  < 

Substituting  s  =  2^,  this  upper  bound  is 

4^2glog(5'»«  2)2)  _  ^^5_^2.081og* 

so  that  ‘doubling  up’  of  states  leads  to 

4  =  0(52‘>9log*)^  (7  8) 

which  is  clearly  much  better  than  the  lower  bound  (7.7)  obtained  by  adding  one  state  at  a  time. 

For  processes  with  a  special  structure  for  the  transition  graph,  these  estimates  can  be 
improved  somewhat.  Assume,  for  example,  that  S  =  |l,...,s  =  2^|  and  that  it  is  known 
that  {A'n}  has  birth-death  structure.  That  is,  kij  is  only  non-zero  for  ji  =  i  -  1  or  j  =  z  -f  1 
when  i  =  2, . . . ,  s  -  1,  when  i  =  0  only  for  j  =  0  or  j  =  1,  and  when  i  =  s  only  for  j  =  s  or 
j  =  5-1.  Proceeding  again  by  ‘doubling  up’  the  number  of  states,  this  leads  to  Algorithm  C 
being  applied  for  F,  G  neighbouring  intervals  of  length  2*^  (fc  =  1, . . . ,  A  —  1 ).  However,  when 
generating  To,  we  need  not  search  all  states  of  G  to  watch  for  a  transition  to  F,  but  only  the 
state  neighbouring  to  F.  If  all  non-zero  entries  of  K  are  1/2,  is  uniform  on  S,  and  we  can 
argue  as  above  to  get 

mk+i  =  2(2  -I-  mk)2^  +  m*. 

AsymptoticaUy,  this  is  easily  seen  to  lead  to  4  =  The  exact  values  for  small  N  are 

given  in  the  following  table; 


N 

ID 

3 

4 

5 

6 

7 

8 

9 

10 

s 

ID 

MM 

8 

16 

32 

64 

128 

256 

512 

1024 

rrifc 

ID 

wm 

268 

4588 

151.468 

WRlTiHi 

KiannM 

IQQil 

A  further  possibility  for  reducing  the  run  length  is  that  7r(5i)  may  be  known  for  some 
partitioning  of  5,  5  =  5i  -b  •  •  ■  +  Sq,  so  that  we  can  first  select  one  of  the  5,  with  the  known 
probabilities  and  next  generate  only  An  example  would  be  random  environment  models 

with  Si  the  event  that  the  environment  is  in  state  i,  which  would  typically  have  a  known  steady- 
state  probability.  □ 

Example  7.2  This  example  serves  two  purposes,  the  first  to  provide  a  non-artificial  example  of 
a  regenerative  process  with  bounded  cycle  length  and  the  second  to  illustrate  the  use  of  coupling 
to  estimate  the  rate  of  convergence  to  the  steady-state.  We  considered  a  (s,  5)  inventory  system 
with  5  =  0,  5  =  1  and  a  demand  process  which  is  a  superposition  of  a  deterministic  unit  rate  and 
a  compound  Poisson  process  with  arrival  rate  (3  and  jumps  which  are  uniformly  distributed  on 
(0, 1/2).  Thus,  X  drifts  downwards  at  a  unit  rate  between  jumps  and  is  instantaneously  reset  to 
1  whenever  (-00, 0]  is  hit,  which  may  occur  either  continuously  at  zero  (no  jump)  or  through  a 
jump  of  size  exceeding  the  present  level  X{t)  of  stock.  Note  that  if  no  arrivals  occur  in  [u,  u  +  z), 
we  have  ^(1;-)  =  X{u)  -  z  (mod  1).  The  cycle  length  r  may  be  taken  as  the  time  spent 
between  consecutive  visits  to  state  1.  The  process  is  Markovian  but  simple  explicit  formulas 
for  the  stationary  distribution  enabling  X*(0)  (and  thereby  X')  to  be  simulated  by  standard 
methods  are  not  available.  However,  since  r  <  1  we  are  in  a  position  to  apply  Algorithm  D. 
For  example,  a  coupling  time  between  a  stationary  version  X‘  and  a  version  X  with  arbitrary 
initial  conditions  can  be  constructed  as  follows: 

Algorithm  G  (Generating  a  Coupling  Time  k  of  a  (s.5)  Inventory  System) 
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of  X”  by  Algorithm  A; 


1  Generate  the  length  0  =  T*(0)  of  the  first  cycle  {-V*(0}o<t‘(o) 

r*  - 1. 

2  Generate  X  over  the  time  inierua/ [0,T*(0)]  by  independent  simulation; 

Y  -  A'(r*(o)),  t  -  r*(0). 

3  Generate  Z,  an  exponential  r.v.  with  rate  0; 

<  -  /  +  Z,  r  -  y  -  Z  (mod  1),  y*  -Z  (mod  1) 

4  IfY  <  Y',  let  a^Y'  -\,b^  V'; 
else  a^Y  -\,b^  Y' 

5  Generate  V,  V'*,  independent  r.v.’s  on  (0,^); 

y  -  y  -  V',  y  -  y*  -  v 

6  If  a  <  Y  <  b  and  a  kY’  <  b,  then  k  <—  t; 
else  return  to  3 


We  took  0  —  1  and  0  =  2,  and  repeated  the  experiment  500  times  for  each  value  of  0. 
Figure  1  g’ves  the  empirical  values  >  t)/500  of  the  survival  probabilities  P(k  >  t) 

giving  an  upper  bound  on  the  rate  of  convergence  to  stationaxity,  cf.  (6.6),  and  shows  the 
expected  tendency  of  slower  coupling  (lower  rate  of  convergence  to  stationarity)  when  0  =  1. 
A  simple  measure  of  this  tendency  is  also  the  observed  empirical  means  of  k  which  were  5,34 
and  1.49,  respectively.  Roughly,  we  may  also  conclude  that  the  process  has  become  for  all 
practical  purposes  stationary  at  time  t  =  10  when  0  =  1,  whereas  the  corresponding  t-value 
is  probably  much  higher  when  0  =  2.  Note,  however,  that  coupling  methods  typically  produce 
only  upper  bounds  and  that  the  interpretation  of  estimates  relating  to  the  coupling  epoch  k  for 
this  and  other  reasons  seem  to  require  some  care  (in  our  opinion,  not  least  when  the  mean  Ek 
is  studied!),  no  matter  whether  such  results  are  obtained  from  theory  as  say  in  [2],  [3]  and  [1]  or 
from  simulation  experiments  like  here.  Our  point  here  is,  however,  not  to  discuss  these  issues, 
only  to  point  out  that  in  fact  in  some  cases  simulation  presents  a  feasible  approach.  □ 

Example  7.3  For  a  simple  yet  non-trivial  case  where  one  can  actually  simulate  a  r.v.  having  the 
stationary  age  distribution,  cf.  Algorithm  E,  consider  the  M/G/1  queue.  If  the  queue  discipline 
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is  FIFO  (First  In  First  Out),  enough  is  known  about  the  steady-state  distribution  to  allow  a 
stationary  version  of  the  system  to  be  simulated.  For  example,  if  the  service  time  is  phase-type 
([21]),  then  so  is  the  steady-state  waiting  time  V'  ([21])  and  hence  straightforward  to  generate  as 
initial  value  for  the  simulation.  In  general,  we  may  use  the  Pollaczek-Khintchine  formula  V'  = 

/?!  + - 1-  /Za?,  where  N  is  geometric  with  rate  p  (the  traffic  intensity)  and  fZi,  /Z2.  •  ■  •  are  i.i.d. 

with  common  density  (1  -  B(x))lp,  where  B  is  the  service  time  distribution  and  p  its  mean; 
typically,  this  density  will  be  simulatable  given  a  specific  form  of  B. 

Of  course,  in  the  FIFO  case  the  need  to  simulate  at  all  is  questionable.  However,  if  instead  we 
are  dealing  with  some  other  work-conserving  discipline  like  some  variant  of  processor-sharing 
or  priority  queueing  it  will  only  rarely  be  the  case  that  full  information  on  the  steady-state 
distribution  is  available.  What  one  could  do  is  then  to  note  that  by  work-conservation,  the 
cycle  length  r  has  the  same  distribution  as  for  the  FIFO  case.  To  generate  a  r.v.  0  having 
the  stationary  age  distribution  as  required  in  step  2  of  Algorithm  B,  one  thus  simply  starts  a 
stationary  FIFO  system  and  let  0  be  its  first  stationary  cycle  length,  noting  that  the  stationary 
distributions  of  the  age  and  the  excess  life  of  the  cycle  are  the  same.  □ 


Example  7.4  Our  purpose  is  here  to  present  a  further  non-trivial  case  where  one  can  actually 
simulate  a  r.v.  having  the  stationary  age  distribution,  cf.  Algorithm  E.  We  take  {.V„}  as  a 
discrete  time  Harris  recurrent  Markov  process,  say  with  state  space  S  and  n-step  transition 
probabilities  K'^{x,A)  =  IP(A'„  €  AjA'o  =  x)  on  w’hich  we  impose  the  minorization  (5.1). 

The  splitting  argument  for  Harris  chains  (see  [4]  for  the  theoretical  background  and  [13] 

for  the  simulation  implementation)  now  states  that  following  each  n  =  0.m.2m . we  may- 

let  a  regeneration  occur  at  time  n  +  m  w.p.  A.  In  this  way  we  obtain  the  distribution  of  the 

zero-delayed  cycle  length  r  as  geometric  on  a  lattice,  P(r  =  im)  =  e(l  -  e)'"’,  i  =  1,2, _ 

From  this  it  follows  that  the  stationary  age  distribution  is  given  by 

lP(r’(0)  =  tm -f  j)  =  — ,  i=l,2,...,  ;  =  0,1, - m  -  1, 

771 

which  is  of  a  form  allowing  for  straightforward  computer  generation.  Note  that  when  m  >  1, 
this  is  a  wide-sense  regenerative  process  that  does  not  (in  general)  have  i.i.d.  cycles  (they  exhibit 
dependence).  □ 


Example  7,5  Consider  the  M/G/s/A  loss  system,  that  is,  a  s-server  system  with  Poisson 
arrivals  at  rate  a,  service  time  distribution  B  and  allowing  at  most  N  customers  present  in  the 
system  at  any  time;  new  customers  arriving  when  N  are  present  are  lost.  The  queue  discipline  is 
work-conserving  but  otherwise  arbitrary.  This  model  will  serve  in  part  as  a  concrete  illustration 
of  the  idea  of  Algorithm  E,  and  in  part  to  illustrate  the  phenomenon  of  continuous  time  but 
lattice  cycle  length  distribution. 

We  shall  need  to  impose  a  mild  condition  on  the  tail  of  B,  namely 


B(x  -b  yo) 
B(x) 


>6, 


X  >  Xo, 


(7.9) 


for  some  ^  >  0,  Xo,j/o  <  00.  For  example  (7.9)  holds  if  liminf6(x)  >  0  where  6(x)  is  the  failure 
rate  of  B  at  x.  With  m  =  xq  +  Voi  it  immediate  follows  that  the  probability  B{x  +  m)/B(x) 
of  service  termination  before  z  given  x  units  of  service  has  been  attained  is  bounded  below  by 
6  for  all  X  >  0.  Hence,  no  matter  the  initial  state  of  the  system,  there  is  a  probability  of  at 
least  f  =  that  the  system  will  be  empty  at  time  m.  The  following  algorithm  describes 

how  to  generate  a  regenerative  cycle,  with  the  cycle  length  distribution  being  geometric  on  the 
lattice  {m,2m, . . .},  P(.Y  =  im)  =  f(  1  -  c)'~*,  i  =  1,2, . . .; 
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Algorithm  H  (Generating  a  Regenerative  Geometric  Cycle  of  a  M/G/s/N  Loss 
System) 

1  <  0,  system  state  <—  idle 

2  Generate  the  system  over  the  interval  (<,  t  +  m] 

3  If  system  state  =  idle,  return  to  2; 
else,  go  to  4 

4  Generate  a  0-1  random  variable  U  with  =  1)  =  c.  If  U  =  1,  go  to  6; 
else,  if  U  =  0,  go  to  5 

5  Generate  a  further  0-1  random  variable  V  with  P(V'  =  1)  =  £.  IfV  =  1.  reject  the  sample 
path  generated  in  the  last  step  of  2  and  return  to  2; 

else,  if  V  =  0,  then  t  ■<—  t  m  and  return  to  2. 

6  r  •—  t  +  m 

To  simulate  the  first  stationary  cycle  of  the  system,  one  constructs  /3  in  step  1  of  Algorithm 
E  by  simulating  Oo,  W  where  P(/3o  =  im)  =  f(l  —  c)*”*,  i  =  1,2, . . ..  and  W  is  uniform  on  (0,  m), 
and  putting  /3  =  /?o  -  HL  The  rest  then  follows  Algorithm  E,  using  Algorithm  H  to  simulate 
the  cycles  of  the  zero-delayed  process.  □ 


8  Appendix:  proofs 


Proof  of  Proposition  2.2  Because  tt  is  not  a  unit  point  mass  distribution,  there  exists  a  probability 
distribution  u  ir  such  that  tr  and  u  are  equivalent  (i.e.,  share  the  same  sets  of  probability 
zero).  Hence  =  0)  =  1  so  that  Pi/,k’(AV  €  •)  =  ‘^(’)-  Since  v  ^  ir,  this  implies  that  T 

is  not  a  stationarity  detection  time  for  A',  yielding  a  contradiction.  □ 

Proof  of  Proposition  2.3  Assume  that  Z  is  a  e-stationarity  detection  rule  in  the  class 
associated  with  the  randomized  stopping  time  a,  and  choose  ^  >  0  such  that  2(1  -  5)  >  c.  Let 
A  be  some  large  integer  to  be  specified  later,  and  define  €  A'’(A'^°))  by 


Q  +  ( 1  - 

(l-a)fc!? 


i  <  A 

i  >  Q,  i  =  j  , 
i>  A,  ii^  j 


0  <  Q  <  1.  For  0  <  Q  <  1,  write  =  Pq.a'Io)  fo*" 

distribution  of  {A„}  starting  from  A'o  =  0,  and  let  r  =  inf  {n  >  1  :  .Y„  =  0}.  Then,  letting 
M  =  max  {X„  :  0  <  n  <  <t},  it  is  easy  to  see  that 


E<®>r  >  -^p(°’(A^  >  A), 


1  -  a 


T-l 

'E 

nsO 


=  7(X,  =  i).  i</t. 


E<“)i 


n=0 


(8.10) 

(8.11) 


In  particular,  (8.10)  converges  to  oo  and  (8.11)  to  0  as  q  t  1,  and  hence  for  some  q  we  have 
x^®*({0, . . . ,  A})  <  6/2.  Choose  A  such  that  pl®)(Z  <  A,M  <  A)  >  1  -  6/2.  Since  P*®^(Z  < 
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A,M  <  /I)  =  <  A,  M  <  A)  by  construction,  we  get 

Ilip(“)(^e  •)-’r<“)(-)ll  >  |lP<“>(Z  <  >1)- 7r<")({0,...,>l})| 

>  |IP(“>(Z  < /1,A/ <  y4)  -  7r<")({0 . >4})| 


>  2(1 


a  contradiction. 


Proof  that  Algorithm  F  is  valid  Suppose  that  the  cycle  length  r  is  not  hounded  by  the  constant 
a,  and  consider  the  r.v.  T3(0)  having  distribution 


1P(T„(0)  €  dx)  = 


xl(0  <  X  <  a)lP(r  €  dx) 
IE[r;0  <  T  <  a] 


We  note  that  the  r.v.  Tq(0)  can  be  generated  by  looking  at  the  subsequence  of  the  r, 's  for  which 
Ti  <  a  and  then  applying  to  the  subsequence  the  ‘length  biased’  acceptance-rejection  technique 
of  Algorithm  A  (see  step  4).  If  (4.9)  is  in  force,  then  (for  any  measurable  set  B) 

|lP(x'(f/T'(0)  +  •)  €  5)  -  IP(X-  €  5)| 

=  |P(X'(f/T^(0)  +  •)  €  5)  -  W{X'iUT'(0)  £  B)| 

=  \r]P[x(Ut  +  -)eB\Ti  =  t]{iP{Ti(0}edt)-jp(T'{0)edt)) 

\Jo 

<  |lP(T<;(0)  €  dt)  -  lP(r'(0)  €  d<)| 

=  r  *  (iFr  n  <  i'  "  ^  +  r  ^ 

Jo  \E[r;0<r<a]  mj  Ja  m 

_  ^  E[r;0<r<a]  E[r;r  >  a] 

m  m 

_  2E[r;r  >  a] 

E^^  ■ 

We  further  note  that  if  Er  >  1,  then 

(8.12) 

E[rV2  .  >  a)| 

Er 

E'/V-E‘/2[r;r  >  a]  .r-  l  c  u 

<  - = -  (Cauchy  -  Schwarz) 

Et 

^  Er  •E'/^[r;r  >  a]  ^  , 

<  - — ^ - -  (since  Er  >  1) 

Er 

=  E*/^[r;r>a], 


(8.12) 


(Cauchy  -  Schwarz) 


(since  Er  >  1) 


whereas  (8.12)  is  bounded  by  E[r;r  >  a]  if  Er  <  1.  But 

E[r;r  >  a]  <  E[(r/a)’’r;r  >  a]  <  a^'^Er'’"''*. 

So,  suppose  that  an  upper  bound  7  on  Er’’"*'*  can  be  computed.  Then,  for  any  positive  f  <  1, 
we  can  choose  a  =  a(e)  as  in  step  1  of  Algorithm  F.  By  doing  so.  this  guarantees  that 

2sup  P{X'(UT'J0)  +  •)  €  fl)  -  E(X*  £  B)  <  e. 

R 
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□ 


Proof  of  Proposition  6.1  Obviously,  t*  is  the  sum  of  the  cycle  lenghts  up  to  and  including  the 
first  accepted  cycle,  and  since  a/m  is  the  acceptance  probability,  it  follows  by  Wald's  identity 
that  Er^  =  Er  ■  a/  m  =  a.  Also,  it  is  standard  (or  seen  by  a  simple  conditioning  argument 
based  upon  Proposition  3.1)  that 

VarA'*(0)  =  VarAV,ot  =  — E  -  Q)^ds  . 

'  ’  m  [Jo 

Thus, 

a?  =  - 1-  =  — E  /  (A"(s)  -  a)ds  ,  =  VarA'‘(0)  •  Er#  =  — E  /  (A'(s)  -  afds. 

m  m  [Jo  m  Jo 

However,  let  t  be  fixed  and  let  U  be  uniform  on  [0,<].  Then 

E[(A(r)-Q)2|A'j  >  E2[(A'(f/)-a)|A']  conditionally  upon  A',  (8.14) 

^  -  afds  >  J^(X{s)-  Q)ds  , 

a  JE  f  (X{s)  —  a)^ds  >  E  fr  /  (A'(s)  — >  E  /  (X(s)  —  a)ds  ,  (8.15) 
Jo  I  Jo  L./o 

where  (8.14)  follows  by  Cauchy-Schwarz  (equality  in  the  last  step  of  (8.15)  would  imply  A'(<)  =  ct 
for  all  t  €  [0,r)  which  is  impossible  because  we  have  excluded  the  case  A'(/)  =  a).  Inserting 
(8.15)  in  (8.13)  completes  the  proof.  □ 

Proof  of  Proposition  6.2  Let  T#(l)  =  rf  =  r#,  let  T#(2)  be  the  sum  of  the  cycle  lenghts  up 
to  and  including  the  second  accepted  cycle,  rf  =  T'#(2)-  r#(l)  and  so  on.  Then  the  T*(n)  are 
regeneration  points  for  X  (being  obtained  by  randomised  stopping  of  the  initial  regeneration 
points)  and  there  are  x(0  of  them  before  t.  Let  —  denote  averaging  from  1  to  \{t),  and  put 


Then  the  (Z,  ,  ,  A^),  i  =  1,2, . . .,  are  i.i.d.,  Oj  (t)  is  a  regenerative  estimator,  and  it  is  standard 

that  of  (1)  and  ai{t}  are  asymptotically  equivalent  in  the  sense  that  the  difference  is  o(>/i).  Thus, 
in  the  proof  we  may  replace  Qi(0  by  of  (t). 

Let  f(Z,r)  =  Z/r,  g(Z,T,  A)  =  (1  -  b)f{Z,  r)  +  bA  and  let  fz,  /t  and  gz,  gr,  gA  denote  the 
partial  derivatives  evaluated  at  (EZ#,Er#)  and  (EZ#,E7-#,EA),  respectively.  Then 

Qi(<)  =  <7(Z^,r#),  af(t)  =  f(Z*,T*,A),  /(EZ#,Er#)  =  5(EZ#,Er#,EA)  =  o, 

and  applying  the  Delta  method  shows  that  aymptotically 

ai(<)  ss  Q  + /^(Z^  -  EZ#)  + /r(T#  -  Er#), 

04(1)  ft:  a+gzi~2* -E.Z*)  +  gr{T* -'ET*}  +  g4(A-JEA) 

=  q  +  (1-6)(/z(:^-EZ#)  +  /^(r#-Er#))+6(A-EA). 

Interpreting  A  -  fzZ*  -  /tT"#  as  a  control  on  fzZ*  +  /tT#,  standard  results  on  linear  control 
variates  therefore  show  that  taking 

CovifzZ*  +  frT*,A-fzZ*-frT*) 

Var(A-/zZ#  -/.r#) 
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will  ensure  <t|  <  erf,  and  that  we  will  have  (Tg  <  <Tj  unless  the  covariance  in  the  definition  of  6 
vanishes;  if  for  nothing  else,  then  because  of  the  randomization  in  step  5  of  Algorithm  D  we  find 
it  hard  to  think  of  examples  where  this  can  happen. 

To  obtain  an  estimator  b^(t)  of  b,  just  estimate  the  3x3  covariance  matrix  of  {Zfr*,  A)  by 
the  empirical  covariance  matrix  (as  when  doing  regression-adjustion  for  linear  control  variates), 
note  that 

1  EZ# 

-  Er#  ’  ~  EV# 

can  be  estimated  by  inserting  the  empirical  means,  and  insert  these  estimates  in  the  definition 
of  6.  □ 
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